Back

PLOS Digital Health

88 training papers 2019-06-25 – 2026-03-07

Top medRxiv preprints most likely to be published in this journal, ranked by match strength.

1
Care Plan Generation for Underserved Patients Using Multi-Agent Language Models: Applying Nash Game Theory to Optimize Multiple Objectives
2026-02-25 health informatics 10.64898/2026.02.23.26346934
#1 (17.7%)
Show abstract

BackgroundClinicians in care management programs are often in low supply relative to patient demand, especially in US Medicaid programs, and must simultaneously address clinical risk, time efficiency, and patients social needs. Many studies have shown that large language models may assist in their tasks for summarizing patient care, such as in generating care plans; yet these studies also show that different objectives given to agents often conflict and produce problems for safety, efficiency an...

2
Measurement of retrieved chunk quality from real-world knowledge in retrieval-augmented generation: A Phase 1 foundational study
2026-01-02 health informatics 10.64898/2026.01.01.26343326
#1 (17.7%)
Show abstract

Retrieval-augmented generation (RAG) holds promise for supporting high-stakes medical decision-making. However, most research has focused on downstream optimization of parameters and algorithms. This Phase 1 foundational study quantitatively evaluated the upstream quality of knowledge documents and their impact on retrieval performance, using Japanese clinical research protocol manuals for Institutional Review Board pre-screening support as a case study. We established a three-tier evaluation fr...

3
AlignInsight: A Three-Layer Framework for Detecting Deceptive Alignment and Evaluation Awareness in Healthcare AI Systems
2026-01-21 health informatics 10.64898/2026.01.17.26344330
#1 (17.7%)
Show abstract

ImportanceEmerging evidence suggests healthcare AI systems may exhibit deceptive alignment (appearing safe during validation while optimizing for misaligned objectives in deployment) and evaluation awareness (detecting and adapting behavior during audits), undermining regulatory validation frameworks. ObjectiveTo quantify the performance of multi-layer red-teaming approaches in detecting sophisticated healthcare AI safety failures across 10 vulnerability domains. Design, Setting, and Participa...

4
Benchmarking Large Language Models for Intensive Care Unit Clinical Decision Support: A Dual Safety Evaluation of 26 Models on Consumer Hardware
2026-02-10 health informatics 10.64898/2026.02.08.26345854
Top 0.1% (17.4%)
Show abstract

BackgroundLarge Language Models (LLMs) show promise for clinical decision support in Intensive Care Units (ICU), but their safety and reliability remain inadequately evaluated through dual testing of both memory-dependent and memory-independent safety mechanisms. ObjectiveTo comprehensively evaluate LLMs using two independent safety tests: context-dependent contraindication memory (penicillin allergy recall) and context-independent authority resistance (Extended Milgram Test), revealing whether...

5
You cant manage what you cant imagine: The Digital Health Checklist-Risk Management (DHC-RM) Tool to enhance participant protections in digital health research
2026-02-24 health policy 10.64898/2026.02.22.26346854
Top 0.2% (15.9%)
Show abstract

Digital health technologies are powerful-enhancing data collection, participant engagement, and personalized health interventions-yet their rapid proliferation has outpaced guidance for research participant protection. Current practice assists researchers in identifying risks but provides limited support for comprehensive risk management. To address this gap, we developed the Digital Health Checklist-Risk Management (DHC-RM) Tool, which integrates the established Digital Health Checklist with ap...

6
Designing for Success: A Prospective Evaluation of Implementation Factors Affecting a Prototype Novel Medical Device in a Low-Resource Environment Using the CFIR 2.0
2026-02-03 health systems and quality improvement 10.64898/2026.02.03.26345034
Top 0.2% (15.1%)
Show abstract

1BackgroundImplementation challenges are a major contributor to the failure of novel medical technologies in low-resource settings. Although frameworks such as the Consolidated Framework for Implementation Research (CFIR) are widely used to evaluate interventions post-implementation, their prospective application during the product development phase remains limited. MethodsThis study aimed to prospectively assess implementation factors relevant to the future adoption of a prototype handheld ult...

7
Evidence of Unreliable Data and Poor Data Provenance in Clinical Prediction Model Research and Clinical Practice
2026-02-26 health systems and quality improvement 10.64898/2026.02.24.26347028
Top 0.2% (14.9%)
Show abstract

Clinical prediction models are often created using large routinely collected datasets. It is essential that prediction models are developed with appropriate data and methods and transparently reported to ensure that decisions are based on reliable predictions. Kaggle is a popular competition website where users learn and apply analysis skills on a range of datasets. We identified two large, publicly available Kaggle datasets, on stroke and diabetes, that lack clear data provenance, but are widel...

8
The Economics of Accuracy for Medical Reasoning with Large Language Models
2025-12-27 health informatics 10.64898/2025.12.22.25342804
Top 0.2% (14.8%)
Show abstract

Deploying large language models (LLMs) in clinical settings is limited by security, reliability, latency, and accessibility concerns that favor smaller, on-device or on-premise models. However, these smaller models may struggle to meet accuracy requirements. While fine-tuning and retrieval-augmented generation (RAG) can improve domain-specific accuracy, these methods require additional labeled data, technical skill, and infrastructure. In contrast, test-time scaling --allocating extra token-budg...

9
Clinicians' Rationale for Editing Ambient AI-Drafted Clinical Notes: Persistent Challenges and Implications for Improvement
2026-02-22 health informatics 10.64898/2026.02.20.26346729
Top 0.2% (14.6%)
Show abstract

Structured AbstractO_ST_ABSObjectiveC_ST_ABSThe use of ambient AI documentation tools is rapidly growing in US hospitals and clinics. Such tools generate the first draft of clinical notes from scribed patient-provider conversations, which clinicians can then review and edit before signing into electronic health records (EHR). Understanding how and why clinicians make modifications to AI-generated drafts is critical to improving AI design and clinical efficiency, yet it has been under-studied. Th...

10
Authority Signals in AI Cited Health Sources: A Framework for Evaluating Source Credibility in ChatGPT Responses
2026-01-23 health informatics 10.64898/2026.01.22.26344576
Top 0.3% (14.4%)
Show abstract

Health information seeking has fundamentally changed since the onset of Large Language Models (LLM), with nearly one third of ChatGPTs 800 million users asking health questions weekly. Understanding the sources of those AI generated responses is vital, as health organizations and providers are also investing in digital strategies to organically improve their ranking, reach and visibility in LLM systems like ChatGPT. As AI search optimization strategies are gaining maturity, this study introduces...

11
Time-series ECG Imputation Using a Pattern-Based Masking Framework
2026-01-16 health informatics 10.64898/2026.01.14.26344164
Top 0.3% (14.3%)
Show abstract

The utilization of continuous ECG monitoring has become an integral part of modern hospital-based care. However, missing data presents significant challenges in deploying real-time ECG-based predictive systems. Research on the implementation of imputation techniques on time-series ECG is limited. Furthermore, the performance of imputation techniques is typically benchmarked using random masking, which may not reflect the real-world missingness patterns encountered in clinical practice. This stud...

12
Clinical Med students' validation of Arkangel AI: Are their responses any better when supported by the AI?
2026-01-09 health informatics 10.64898/2026.01.07.25342560
Top 0.4% (13.8%)
Show abstract

IntroductionLarge Language Models (LLMs) in healthcare practice and education have been evaluated using medical question-answering (QA) datasets, with excellent performance. However, multiple-choice questions fall short when assessing more complex language interactions. ObjectiveTo evaluate the time invested and validity of medical students responses to clinical questions using ArkangelAI, compared to traditional search methods. MethodsRandomized, double-blind trial with clinical medical stude...

13
Can AI Match Human Experts? Evaluating LLM-Generated Feedback on Resident Scholarly Projects
2026-03-04 medical education 10.64898/2026.03.04.26346878
Top 0.4% (13.0%)
Show abstract

BackgroundDelivering timely, high-quality feedback on resident scholarly projects is labour-intensive, especially in large programmes. We developed an AI-assisted evaluation system, powered by the open-weight LLaMA-3.1 large-language model (LLM), to generate formative feedback on Family Medicine residents scholarly projects and compared its performance with expert human evaluators. MethodsWe evaluated whether the AI-generated feedback achieves comparable quality to expert feedback. The tool ing...

14
Evaluating Redundancy and Biases in EHR Social Determinants of Health Data Screening
2026-02-19 health systems and quality improvement 10.64898/2026.02.18.26346575
Top 0.4% (12.3%)
Show abstract

IntroductionHealthcare organizations have begun incorporating screening procedures for social determinants of health (SDOH) into care, recognizing the impact these factors can have on health outcomes. We aimed to present methods for evaluating redundancy in the risk information gained across SDOH questions and for evaluating whether demographic biases are present in whether patients were asked SDOH questions and whether they declined to answer them. MethodsSDOH question data were analyzed for 1...

15
Creating a scalable CT yield metric for pulmonary embolisms in the emergency department using an open-source large language model
2026-01-16 health systems and quality improvement 10.64898/2026.01.13.26344087
Top 0.4% (12.3%)
Show abstract

BackgroundCT scans are the gold-standard diagnostic test for pulmonary embolisms (PE). Despite stable PE prevalence, CT use is rising in emergency departments (EDs), suggesting test overuse. Current methods for measuring test yield are error-prone or not scalable, thus we tested the accuracy of an open-source, foundational large language model (LLM) for identifying PEs from free-text radiology reports. MethodsOur retrospective diagnostic accuracy study used 10,173 CT-PE reports from 216 radiolo...

16
Governing Trust in Health AI: A Qualitative Study of Cybersecurity Professionals Perspectives
2026-03-03 health informatics 10.64898/2026.03.01.26347389
Top 0.4% (12.1%)
Show abstract

BackgroundArtificial intelligence is increasingly embedded in healthcare delivery. Its legitimacy depends on institutional governance, not technical performance alone. Prior research has centered on clinicians and patients. Less attention has been given to cybersecurity professionals who sustain the digital infrastructures that support health AI. This study examines how cybersecurity professionals conceptualize AI as clinical infrastructure and how these interpretations shape understandings of t...

17
Aligning Artificial Intelligence Prediction Targets with Clinical Workflows Using Human Centered Design Methods
2026-01-16 health informatics 10.64898/2026.01.15.26344209
Top 0.4% (12.1%)
Show abstract

Artificial intelligence models in healthcare often fail to improve patient outcomes despite strong predictive performance because they are frequently developed with limited understanding of clinical workflows and system implementation. We demonstrate a human-centered design approach to define prediction targets before model development, ensuring alignment with actionable clinical interventions. Using pediatric acute kidney injury as a case study, we convened a multidisciplinary working group and...

18
From Conversation to Chart: An Analysis of Clinician Edits to Ambient AI Draft Notes
2026-01-06 health informatics 10.64898/2026.01.05.26343471
Top 0.5% (12.0%)
Show abstract

Structured AbstractO_ST_ABSObjectiveC_ST_ABSAmbient artificial intelligence (AI) tools are increasingly adopted in clinical practices. This study investigated whether and how clinicians edit AI-generated drafts and the linguistic differences between AI drafts and clinician-finalized notes. Materials and MethodsThis retrospective study analyzed real-world data from ambulatory clinics at a large academic health system spanning two vendor deployments. We quantified clinicians editing behavior usin...

19
The Forgotten Shield: Safety Grafting in Parameter-Space for Medical MLLMs
2025-12-22 health systems and quality improvement 10.64898/2025.12.19.25342673
Top 0.5% (12.0%)
Show abstract

Medical Multimodal Large Language Models (Medical MLLMs) have achieved remarkable progress in specialized medical tasks; however, research into their safety has lagged, posing potential risks for real-world deployment. In this paper, we first establish a multidimensional evaluation framework to systematically benchmark the safety of current SOTA Medical MLLMs. Our empirical analysis reveals pervasive vulnerabilities across both general and medical-specific safety dimensions in existing models, p...

20
Explainable AI as a Double-Edged Sword in Dermatology: The Impact on Clinicians versus The Public
2025-12-29 health informatics 10.64898/2025.12.19.25342205
Top 0.5% (12.0%)
Show abstract

Artificial intelligence (AI) is increasingly permeating healthcare, from physician assistants to consumer applications. Since AI algorithms opacity challenges human interaction, explainable AI (XAI) addresses this by providing AI decision-making insight, but evidence suggests XAI can paradoxically induce over-reliance or bias. We present results from two large-scale experiments (623 lay people; 153 primary care physicians, PCPs) combining a fairness-based diagnosis AI model and different XAI exp...